AITopics | neural tangent kernel

Collaborating Authors

neural tangent kernel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability

Neural Information Processing SystemsJun-11-2026, 10:32:41 GMT

The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Neural Information Processing SystemsApr-25-2026, 02:24:01 GMT

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Globally Convergent Variational Inference

Neural Information Processing SystemsMar-18-2026, 23:09:30 GMT

In variational inference (VI), an approximation of the posterior distribution is selected from a family of distributions through numerical optimization. With the most common variational objective function, known as the evidence lower bound (ELBO), only convergence to a optimum can be guaranteed. In this work, we instead establish the convergence of a particular VI method. This VI method, which may be considered an instance of neural posterior estimation (NPE), minimizes an expectation of the inclusive (forward) KL divergence to fit a variational distribution that is parameterized by a neural network. Our convergence result relies on the neural tangent kernel (NTK) to characterize the gradient dynamics that arise from considering the variational objective in function space. In the asymptotic regime of a fixed, positive-definite neural tangent kernel, we establish conditions under which the variational objective admits a unique solution in a reproducing kernel Hilbert space (RKHS). Then, we show that the gradient descent dynamics in function space converge to this unique function. In ablation studies and practical problems, we demonstrate that our results explain the behavior of NPE in non-asymptotic finite-neuron settings, and show that NPE outperforms ELBO-based optimization, which often converges to shallow local optima.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

A generalized neural tangent kernel for surrogate gradient learning

Neural Information Processing SystemsMar-18-2026, 05:21:04 GMT

State-of-the-art neural network training methods depend on the gradient of the network function. Therefore, they cannot be applied to networks whose activation functions do not have useful derivatives, such as binary and discrete-time spiking neural networks. To overcome this problem, the activation function's derivative is commonly substituted with a surrogate derivative, giving rise to surrogate gradient learning (SGL). This method works well in practice but lacks theoretical foundation.The neural tangent kernel (NTK) has proven successful in the analysis of gradient descent. Here, we provide a generalization of the NTK, which we call the surrogate gradient NTK, that enables the analysis of SGL.

activation function, artificial intelligence, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Neural Information Processing SystemsMar-16-2026, 20:56:44 GMT

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the so-called kernel gradient associated with a new object, which we call the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence of the training can then be related to the positive-definiteness of the limiting NTK. We then focus on the setting of least-squares regression and show that in the infinite-width limit, the network function follows a linear differential equation during training. The convergence is fastest along the largest kernel principal components of the input data with respect to the NTK, hence suggesting a theoretical motivation for early stopping. Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

abd3c6b90e474ec50a52c446926b00be-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 08:59:17 GMT

graph, representation, temp-g 3, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(13 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Education (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Simon S. Du, Kangcheng Hou, Russ R. Salakhutdinov, Barnabas Poczos, Ruosong Wang, Keyulu Xu

Neural Information Processing SystemsFeb-12-2026, 10:17:40 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, gntk, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

dee8f820d86aca28ab0328a9243020f9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:12:02 GMT

algorithm, gnn, graph, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

A Overview

Neural Information Processing SystemsFeb-11-2026, 09:36:32 GMT

C.1 More details on datasets and models being explained For text datasets, We employ three-layer convolutional neural networks. Due to the large number of parameters in the target model (over 1.6 million), we do not implement Surrogate derivative: We first transform Eqn. We then follow Y eh et al. T arget derivative: We directly adopt Eqn. ( t 1) ( t 1) For last layer embeddings and neural tangent kernels, we directly use Eqn. We train 5 models simultaneously on a single GPU for speed up.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Filters

Collaborating Authors

neural tangent kernel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Globally Convergent Variational Inference

A generalized neural tangent kernel for surrogate gradient learning

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

abd3c6b90e474ec50a52c446926b00be-Paper-Conference.pdf

cf9dc5e4e194fc21f397b4cac9cc3ae9-Paper.pdf

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

dee8f820d86aca28ab0328a9243020f9-Paper-Conference.pdf

A Overview